AITopics | gradient tracking

Collaborating Authors

gradient tracking

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Second-OrderOptimalityinNon-Convex DecentralizedOptimizationviaPerturbedGradient Tracking

Neural Information Processing SystemsFeb-11-2026, 02:13:03 GMT

Simultaneously, convex formulations for training machine learning tasks have been replaced by nonconvex representations such as neural networks.

artificial intelligence, machine learning, stationary point, (18 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.05)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

An Improved Analysis of Gradient Tracking for Decentralized Machine Learning

Neural Information Processing SystemsDec-24-2025, 04:51:35 GMT

We consider decentralized machine learning over a network where the training data is distributed across $n$ agents, each of which can compute stochastic model updates on their local data. The agent's common goal is to find a model that minimizes the average of all local loss functions. While gradient tracking (GT) algorithms can overcome a key challenge, namely accounting for differences between workers' local data distributions, the known convergence rates for GT algorithms are not optimal with respect to their dependence on the mixing parameter $p$ (related to the spectral gap of the connectivity matrix).We provide a tighter analysis of the GT method in the stochastic strongly convex, convex and non-convex settings. We improve the dependency on $p$ from $\mathcal{O}(p^{-2})$ to $\mathcal{O}(p^{-1}c^{-1})$ in the noiseless case and from $\mathcal{O}(p^{-3/2})$ to $\mathcal{O}(p^{-1/2}c^{-1})$ in the general stochastic case, where $c \geq p$ is related to the negative eigenvalues of the connectivity matrix (and is a constant in most practical applications). This improvement was possible due to a new proof technique which could be of independent interest.

decentralized machine learning, gradient tracking, improved analysis, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.80)

Add feedback

9ec51f6eb240fb631a35864e13737bca-AuthorFeedback.pdf

Neural Information Processing SystemsAug-15-2025, 11:12:50 GMT

algorithm, gradient tracking, submission, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.76)

Add feedback

An Improved Analysis of Gradient Tracking for Decentralized Machine Learning Anastasia Koloskova EPFL anastasia.koloskova@epfl.ch Tao Lin EPFL tao.lin@epfl.ch Sebastian U. Stich EPFL

Neural Information Processing SystemsAug-14-2025, 18:39:12 GMT

The agent's common goal is to find a model that minimizes the

algorithm, convergence rate, optimization, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Virginia (0.04)
North America > United States > Massachusetts (0.04)
North America > United States > Illinois > Champaign County > Champaign (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

An Improved Analysis of Gradient Tracking for Decentralized Machine Learning

Neural Information Processing SystemsMay-26-2025, 20:34:26 GMT

We consider decentralized machine learning over a network where the training data is distributed across n agents, each of which can compute stochastic model updates on their local data. The agent's common goal is to find a model that minimizes the average of all local loss functions. While gradient tracking (GT) algorithms can overcome a key challenge, namely accounting for differences between workers' local data distributions, the known convergence rates for GT algorithms are not optimal with respect to their dependence on the mixing parameter p (related to the spectral gap of the connectivity matrix).We provide a tighter analysis of the GT method in the stochastic strongly convex, convex and non-convex settings. We improve the dependency on p from \mathcal{O}(p {-2}) to \mathcal{O}(p {-1}c {-1}) in the noiseless case and from \mathcal{O}(p {-3/2}) to \mathcal{O}(p {-1/2}c {-1}) in the general stochastic case, where c \geq p is related to the negative eigenvalues of the connectivity matrix (and is a constant in most practical applications). This improvement was possible due to a new proof technique which could be of independent interest.

artificial intelligence, decentralized machine learning, improved analysis, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Distributed Value Decomposition Networks with Networked Agents

Varela, Guilherme S., Sardinha, Alberto, Melo, Francisco S.

arXiv.org Artificial IntelligenceFeb-11-2025

We investigate the problem of distributed training under partial observability, whereby cooperative multi-agent reinforcement learning agents (MARL) maximize the expected cumulative joint reward. We propose distributed value decomposition networks (DVDN) that generate a joint Q-function that factorizes into agent-wise Q-functions. Whereas the original value decomposition networks rely on centralized training, our approach is suitable for domains where centralized training is not possible and agents must learn by interacting with the physical environment in a decentralized manner while communicating with their peers. DVDN overcomes the need for centralized training by locally estimating the shared objective. We contribute with two innovative algorithms, DVDN and DVDN (GT), for the heterogeneous and homogeneous agents settings respectively. Empirically, both algorithms approximate the performance of value decomposition networks, in spite of the information loss during communication, as demonstrated in ten MARL tasks in three standard environments.

agent, algorithm, dvdn, (14 more...)

arXiv.org Artificial Intelligence

2502.07635

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > District of Columbia > Washington (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
(10 more...)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

Add feedback

An Improved Analysis of Gradient Tracking for Decentralized Machine Learning

Neural Information Processing SystemsOct-10-2024, 16:39:55 GMT

decentralized machine learning, gradient tracking, improved analysis, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Momentum Tracking: Momentum Acceleration for Decentralized Deep Learning on Heterogeneous Data

Takezawa, Yuki, Bao, Han, Niwa, Kenta, Sato, Ryoma, Yamada, Makoto

arXiv.org Artificial IntelligenceSep-24-2023

SGD with momentum is one of the key components for improving the performance of neural networks. For decentralized learning, a straightforward approach using momentum is Distributed SGD (DSGD) with momentum (DSGDm). However, DSGDm performs worse than DSGD when the data distributions are statistically heterogeneous. Recently, several studies have addressed this issue and proposed methods with momentum that are more robust to data heterogeneity than DSGDm, although their convergence rates remain dependent on data heterogeneity and deteriorate when the data distributions are heterogeneous. In this study, we propose Momentum Tracking, which is a method with momentum whose convergence rate is proven to be independent of data heterogeneity. More specifically, we analyze the convergence rate of Momentum Tracking in the setting where the objective function is non-convex and the stochastic gradient is used. Then, we identify that it is independent of data heterogeneity for any momentum coefficient $\beta \in [0, 1)$. Through experiments, we demonstrate that Momentum Tracking is more robust to data heterogeneity than the existing decentralized learning methods with momentum and can consistently outperform these existing methods when the data distributions are heterogeneous.

convergence rate, momentum tracking, tracking, (11 more...)

arXiv.org Artificial Intelligence

2209.15505

Country:

Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
Asia > Japan > Kyūshū & Okinawa > Okinawa (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.64)

Add feedback

On the Performance of Gradient Tracking with Local Updates

Nguyen, Edward Duc Hien, Alghunaim, Sulaiman A., Yuan, Kun, Uribe, César A.

arXiv.org Artificial IntelligenceOct-12-2022

We study the decentralized optimization problem where a network of $n$ agents seeks to minimize the average of a set of heterogeneous non-convex cost functions distributedly. State-of-the-art decentralized algorithms like Exact Diffusion~(ED) and Gradient Tracking~(GT) involve communicating every iteration. However, communication is expensive, resource intensive, and slow. In this work, we analyze a locally updated GT method (LU-GT), where agents perform local recursions before interacting with their neighbors. While local updates have been shown to reduce communication overhead in practice, their theoretical influence has not been fully characterized. We show LU-GT has the same communication complexity as the Federated Learning setting but allows arbitrary network topologies. In addition, we prove that the number of local updates does not degrade the quality of the solution achieved by LU-GT. Numerical examples reveal that local updates can lower communication costs in certain regimes (e.g., well-connected graphs).

algorithm 1, artificial intelligence, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2210.04757

Country:

North America > United States > Texas > Harris County > Houston (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
Asia > Middle East > Kuwait > Capital Governorate > Kuwait City (0.04)
(2 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.34)

Add feedback

Communication-Efficient Distributed Optimization in Networks with Gradient Tracking

Li, Boyue, Cen, Shicong, Chen, Yuxin, Chi, Yuejie

arXiv.org Machine LearningSep-12-2019

There is a growing interest in large-scale machine learning and optimization over decentralized networks, e.g. in the context of multi-agent learning and federated learning. Due to the imminent need to alleviate the communication burden, the investigation of communication-efficient distributed optimization algorithms --- particularly for empirical risk minimization --- has flourished in recent years. A large faction of these algorithms have been developed for the master/slave setting, relying on the presence of a central parameter server that can communicate with all agents. This paper focuses on distributed optimization over the network-distributed or the decentralized setting, where each agent is only allowed to aggregate information from its neighbors over a network (namely, no centralized coordination is present). By properly adjusting the global gradient estimate via a tracking term, we develop a communication-efficient approximate Newton-type method, called Network-DANE, which generalizes DANE [Shamir et al., 2014] for decentralized networks. We establish linear convergence of Network-DANE for quadratic losses, which shed light on the impact of data homogeneity and network connectivity upon the rate of convergence. Our key algorithmic ideas can be applied, in a systematic manner, to obtain decentralized versions of other master/slave distributed algorithms. A notable example is our development of Network-SVRG, which employs stochastic variance reduction [Johnson and Zhang, 2013] at each agent to accelerate local computation. The proposed algorithms are built upon the primal formulation without resorting to the dual. Numerical evidence is provided to demonstrate the appealing performance of our algorithms over competitive baselines, in terms of both communication and computation efficiency.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

1909.05844

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)

Add feedback